class: title-slide, left, bottom # Feedforward neural networks from a statistical-modelling perspective ---- ## **Andrew McInerney**, **Kevin Burke** ### University of Limerick #### IWSM, 19 July 2023 --- # Feedforward Neural Networks -- .left-column[ <img src="data:image/png;base64,#img/FNN.png" width="100%" height="110%" style="display: block; margin: auto;" /> ] <br> -- .right-column[ <img src="data:image/png;base64,#img/nneq1.png" width="95%" height="100%" style="display: block; margin: auto 0 auto auto;" /> ] --- count: false # Feedforward Neural Networks .left-column[ <img src="data:image/png;base64,#img/FNN.png" width="100%" height="110%" style="display: block; margin: auto;" /> ] <br> .right-column[ <img src="data:image/png;base64,#img/nneq2.png" width="95%" height="100%" style="display: block; margin: auto 0 auto auto;" /> ] --- count: false # Feedforward Neural Networks .left-column[ <img src="data:image/png;base64,#img/FNN.png" width="100%" height="110%" style="display: block; margin: auto;" /> ] <br> .right-column[ <img src="data:image/png;base64,#img/nneq3.png" width="95%" height="100%" style="display: block; margin: auto 0 auto auto;" /> ] --- count: false # Feedforward Neural Networks .left-column[ <img src="data:image/png;base64,#img/FNN.png" width="100%" height="110%" style="display: block; margin: auto;" /> ] <br> .right-column[ <img src="data:image/png;base64,#img/nneq4.png" width="95%" height="100%" style="display: block; margin: auto 0 auto auto;" /> ] --- count: false # Feedforward Neural Networks .left-column[ <img src="data:image/png;base64,#img/FNN.png" width="100%" height="110%" style="display: block; margin: auto;" /> ] <br> .right-column[ <img src="data:image/png;base64,#img/nneq5.png" width="95%" height="100%" style="display: block; margin: auto 0 auto auto;" /> ] --- # Data Application -- ### Insurance Data (Kaggle) -- 1,338 beneficiaries enrolled in an insurance plan -- Response: `charges` -- 6 Explanatory Variables: .pull-left[ - `age` - `sex` - `bmi` ] .pull-left[ - `children` - `smoker` - `region` ] --- # R Implementation -- Many packages available to fit neural networks in R. <br> -- Some popular packages are: -- - `nnet` -- - `neuralnet` -- - `keras` -- - `torch` --- # R Implementation: nnet -- ```r library(nnet) nn <- nnet(charges ~ ., data = insurance, size = 2, maxit = 2000, linout = TRUE) summary(nn) ``` -- ```{.bg-primary} ## a 8-2-1 network with 21 weights ## b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1 i6->h1 i7->h1 i8->h1 ## 1.39 -0.43 0.08 0.03 -0.08 -3.16 0.07 0.11 0.15 ## b->h2 i1->h2 i2->h2 i3->h2 i4->h2 i5->h2 i6->h2 i7->h2 i8->h2 ## 6.31 0.04 0.13 2.19 -0.11 -6.19 0.15 0.12 0.14 ## b->o h1->o h2->o ## 1.08 -4.82 2.45 ## [...] ``` --- # Proposed Solution: interpretnn -- .left-column[ <br> <img src="data:image/png;base64,#img/interpretnn.png" width="80%" style="display: block; margin: auto;" /> ] -- .right-column[ <br> <br> ```r # install.packages("devtools") library(devtools) install_github("andrew-mcinerney/interpretnn") ``` ] --- # Statistical Perspective -- $$ y_i = \text{NN}(x_i) + \varepsilon_i, $$ -- where $$ \varepsilon_i \sim N(0, \sigma^2) $$ <br> -- $$ \ell(\theta, \sigma^2)= -\frac{n}{2}\log(2\pi\sigma^2)-\frac{1}{2\sigma^2}\sum_{i=1}^n(y_i-\text{NN}(x_i))^2 $$ --- # Uncertainty Quantification Then, as `\(n \to \infty\)` $$ \hat{\theta} \sim N[\theta, \Sigma = \mathcal{I}(\theta)^{-1}] $$ -- Estimate `\(\Sigma\)` using $$ \hat{\Sigma} = I_o(\hat{\theta})^{-1} $$ -- <br> However, inverting `\(I_o(\hat{\theta})\)` can be problematic in neural networks. --- # Redundancy -- Redundant hidden nodes can lead to issues of unidentifiability for some of the parameters (Fukumizu 1996). <br> -- Redundant hidden nodes `\(\implies\)` Singular information matrix. <br> -- Trade-off between model flexibility and interpretability. --- # Significance Testing -- .pull-left[ <img src="data:image/png;base64,#img/FNN1.png" width="100%" style="display: block; margin: auto;" /> ] --- count: false # Significance Testing .pull-left[ <img src="data:image/png;base64,#img/FNN2.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ Wald test: {{content}} ] -- $$ `\begin{equation} \omega_j = (\omega_{j1},\omega_{j2},\dotsc,\omega_{jq})^T \end{equation}` $$ {{content}} -- $$ `\begin{equation} H_0: \omega_j = 0 \end{equation}` $$ {{content}} -- $$ `\begin{equation} (\hat{\omega}_{j} - \omega_j)^T\Sigma_{\hat{\omega}_{j}}^{-1}(\hat{\omega}_{j} - \omega_j) \sim \chi^2_q \end{equation}` $$ {{content}} --- # Insurance: Model Summary ```r intnn <- interpretnn(nn) summary(intnn) ``` -- ```{.bg-primary} ## Coefficients: ## Weights | X^2 Pr(> X^2) ## age (-0.43***, 0.04) | 41.4363 1.01e-09 *** ## sex.male (0.08*, 0.13) | 5.5055 6.38e-02 . ## bmi (0.03, 2.19***) | 105.6106 0.00e+00 *** ## children (-0.08***, -0.11.) | 19.0146 7.43e-05 *** ## smoker.yes (-3.16***, -6.19***) | 250.6393 0.00e+00 *** ## region.northwest (0.07., 0.15) | 2.8437 2.41e-01 ## region.southeast (0.11*, 0.12) | 6.2560 4.38e-02 * ## region.southwest (0.15**, 0.14) | 10.8218 4.47e-03 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- # Insurance: Model Summary ```r plotnn(intnn) ``` -- <img src="data:image/png;base64,#img/plotnn-1.png" width="90%" style="display: block; margin: auto;" /> --- # Covariate-Effect Plots $$ `\begin{equation} \widehat{\overline{\text{NN}}}_j(x) = \frac{1}{n}\sum_{i=1}^n \text{NN}(x_{i,1}, \ldots,x_{i,j-1},x, x_{i,j+1}, \ldots) \end{equation}` $$ -- Propose covariate-effect plots of the following form: -- $$ `\begin{equation} \hat{\beta}_j(x,d) =\widehat{\overline{\text{NN}}}_j(x + d) -\widehat{\overline{\text{NN}}}_j(x) \end{equation}` $$ -- Usually set `\(d = \text{SD}(x_j)\)` --- # Insurance: Covariate Effects ```r plot(intnn, conf_int = TRUE, which = c(1, 4)) ``` -- .pull-left[ <!-- --> ] -- .pull-right[ <!-- --> ] --- # Summary -- * Statistical-modelling approach to neural networks <br> -- * Provide an R package to improve interpretability <br> -- * Make neural networks more familiar in a statistical-modelling context --- class: bigger # References * McInerney, A., & Burke, K. (2022). A statistically-based approach to feedforward neural network model selection. arXiv preprint arXiv:2207.04248. * McInerney, A., & Burke, K. (2023). Interpreting feedforward neural networks as statistical models. In Preparation. ### R Package ```r devtools::install_github("andrew-mcinerney/interpretnn") ```
<font size="5.5">andrew-mcinerney</font>
<font size="5.5">@amcinerney_</font>
<font size="5.5">andrew.mcinerney@ul.ie</font>